Taking care of your backups
Many programmers write a lot of code, regularly. Unfortunately, a lot of this code is written for nothing. Why? Because the code gets lost. Usable code, gets lost. The goal of this article is to make you aware of the risks of not making backups, and the many methods to atleast have some backups. In this article, I will try to give some good methods how to ensure that your code does not get lost. I’m going to cover Windows only. Maybe someone else can write a nice document for Linux including backup cronjobs, etc. How can you lose your code? Data corruption Your computer suddenly shuts down because of overheating, a power surge, or an error. Just when writing your code to the hard drive. The chance that your file(s) end up as a “FILE0001.chk” because of scandisk are quite large. It’s safe to assume that your code is lost there. Theft Your computer (or laptop) gets stolen, with all your code on it. Very unfortunate, especially if you only have a backup of a few months ago. Data destruction Because of whatever reason, your data is completely gone. Your PC catches fire, lightning strikes and blows up your computer, etc. Since the hard disk is likely to be physically damaged, there isn’t much hope to get it working. Although some people seem to have luck by replacing the hard disk circuit board with another one, with the same type of course. Accidental Deletion Yes, it happens. Sometimes, for whatever reason, you end up deleting something you didn't want to delete, thinking you were deleting something you wanted to delete. On Windows machines, you have the Recycle Bin to lean on. But there's always the danger of emptying the recycle bin before realizing what you've done. Although you can never be 100% sure that you lose your code, we can at least try to keep extra versions. Different copies on a hard disk One option could be to simply have a second directory of your code somewhere on the (same physical) hard disk. This could be done with a backup program, syncing files daily. This method is simple, basically free, protects you from accidental deletions. The downside is that if the hard disk crashes, you will still lose all your files. Different copies on a different hard disk With this method, two hard disks are installed in the same computer. These hard disks can be connected by RAID (if they are SATA disks, and your motherboard supports it, or SCSI disks). The problem with RAID (configured as mirror) is that, if you delete files accidentally, so will be the files on the other hard disk! The two drives are simply identical to each other. The good thing about a mirrored drive is that the chance of losing code is smaller when one hard disk crashes. It’s unlikely that two hard disks crash at the same time (unless lightning strikes, a fire). With theft you’re still losing your code complete as well. A better approach would be to copy the files from disk 1 to disk 2 every day, so you can only lose one day of code (which can still be a lot, but a lot less than months of work). Copy to a different machine If you have an old machine, you could configure it as a backup server! This machine is just used to store the backup data of for example, multiple computers. The good thing is that the files are on a complete separate hard disk and computer. There is less chance to have the machine stolen as well. Especially if the computer is not located next to your (other) computer(s). The computer shouldn’t be directly connected to the internet though. You could, but I wouldn’t risk it. My current setup is to have the machine behind a NAT router and a simple firewall. Just in case. That way I can still run virus scanner updates, while the computer is simply not allowing most other programs send or receive data. Rewritable media Burning backups on rewritable media is generally a good idea. All techniques mentioned above still suffer from problems like lightning strikes, theft (more or less) and fire, flooding, you name it. Maybe you guys think I’m exaggerating, but I’m sure plenty of the GPWiki visitors live in places where extreme things might happen. Forest fires, earthquakes, a dike breaking – you get the point. So, there’s something badly going on, and all your backups are on your backup machine or work machine. You can’t throw your PC in a hurry out of the window, since it’s all tied up with cables. And you really do not have the time to get a screwdriver and remove the hard drive from the computer either! (A removable/ Hot swappable HD rack might work for this though) The best solution for this is to put your data regularly on a DVD/CD. If you have a spare tape drive, then that is fine too. The danger of writing backups on rewritable media is the manual factor. If you have so much data to backup, that it will for more than one CD or DVD, then you will have to change the DVD. Something you might not even do, because you are busy with something else. I have this problem often. It starts to thunder heavily outside, I removed the plugs from the network cables/power (call me paranoid, but I just value my data), but when the thunder is real close, a neighboring hit might already generate an huge EM spike causing your data to go into ‘oblivion’. Also note that you should keep two CD’s/DVD’s with backups. Why? If you are burning the same copy over and over, and one time that DVD does not work, you are stuck with a backup on your hard drive… or nothing! You should never overwrite the current backup with a new one. Always store a previous copy! For maximum protection, use the Grandfather-Father-Son backup system. Rotate three or more disks, overwriting the oldest each time. This way you always two spare backups. Also, if you deal with large amounts of data that changes regularly, do a full backup once a month and a differential or incremental each week to keep the media costs down. Back to the paranoid mode: You got DVD’s. Quite some. Some have code, others have your e-mail or old school documents (nice to remember), and gigabytes of vacation photos. If those aren’t stored together at one place, you can still consider your backups ‘semi-useless’. If your hard drive crashes, you got still plenty of time to find the correct media, and put the data back. If your house is on fire, you won’t. Therefore, get a simple CD case. Might cost a few dollar (around $10?), but if everyone knows in the house where it is located, and you keep your backups in there, you can always (assuming it is not at the attic, or cellar…) quickly get it, and take it with you. Maybe you could store some other valuable things in there too. Hardware can be replaced, data, digital photos, cannot. Geographical separation If you have lots of important files that will not change often (older digital photos), keep a copy in a different place. For example and the house of grand parents, relatives, friends and whatever else comes to mind. Maybe running encryption over the files might be needed. This will make sure that really no one will check out your files at all. This is best done by keeping the DVD/CD’s (or a complete old hard drive if you desire) in a special box, stored somewhere at the attic or so… The Internet FTP Some backup tools (or write your own) can connect to FTP servers, and upload the backup to over there. This is a great way of geographically separate your backups. The best place to upload your backups is outside the public accessible web directory (\wwwroot, \public_html) so that no one can access your backups by simply pointing the browser to the file. Ofcourse they shouldn’t know the filename at all, but it’s just some extra reassurance for you. I would encrypt the files too though, for even more extra reassurance. If the server is shared with other users, maybe they can access your home directory too, for example? Version Control Systems Also a more sophisticated (yet not that hard to implement if you already have an internet server to access over the internet) is the use of a version control system like CVS. Even if you are the only person working on your code, by making a repository on an external server you are keeping a very recent copy of your data away from your terminal. Most version control systems allow their servers to be configured to use SSL sockets that will protect your repository from other people accessing its data. Setting the server up will be the hard part, but after that, regularily keeping updated backups will be trivial. The negative of this setup is that the repository history will be lost if you lose the server, but there should be a local copy on your computer in that event so you're most recent work will not be lost. Also these systems usually don't handle binary data that well. So using mostly images and other binary data will cause the repository to bloat considerably. My backup setup This is an example how I take care of my backups: * Every 2 days, my e-mail and documents compressed and stored on a different hard drive in my PC. This happens when I’m drinking coffee normally in the evening, so I do not even notice sluggishness. * Code (VB/C++/webserver directories) is compressed and copied daily to this other hard drive. Previous backups stay for 5 versions (5 days). * Photos are copied weekly. These are quite some gigabytes, no extra old versions, no compression. The JPG’s do not compress quite well. They are added to a zip file though, but simply no compression level. * Some of my files (the code that gets updated daily only, and some other regular changing data) gets uploaded to the server running my site, compressed, encrypted, and outside the public accessible directory. This server is running somewhere in Amsterdam, which should give a nice geographical separation. * Every Saturday all files are copied to the backup server. This backup server has two hard drives. (Two old (slow) 20gb ones). * The backup server has a nice motherboard which automatically can turn itself on. Not using that though, since I can only specify a time I believe. “Wake On Lan” is something I’m considering too, though. Then the computer can turn itself on when a special data packet is received. **Lots of backups for multiple PC’s are stored on this machine at Saturday **This backup server is also turned on at Sunday, where it will mirror its first disk with the second disk. * The weakest link: My manual DVD burning process. This is something I’d like to automate someday. * An aluminum CD/DVD case which has the most important data, ready to be picked up when needed. Used software I always used to use Handybackup for my backup needs. It’s a truly great program. Unfortunately, for my current setup, I’d need two licenses (this machine, backup machine). Well, you probably know how being a student works regarding money. You should try the demo though, and see if it suits you. Now I’m using SyncBack. A freeware version of this program is available, offering most of the options that HandyBackup has too. For me personal downside is that the Windows Scheduler is used. But that shouldn’t be a real problem. The configuration is a less intuitive too. But it’s free too! The backup server is simply running a Windows 2000 installation, and the files are copied to this machine by using the FTP server that comes with Windows 2000. This to make sure we do not need all kinds of logon scripts to connect to this machine, before we can copy files… AVG is a freeware virus scanner, which I think is one of the best I’ve come across. I believe the free version does not install on Windows 2003, though. The virus scanner is just one extra level of security. All computers in this house already run it, but you never know. ZoneAlarm the only computer having a firewall is the backup server. Just to make sure there is no ‘odd’ data exchange… ‘Home made shut down tool’ this Visual Basic program pings the IP range of the networked computers, about every 10 minutes. If no ping replies are positive, it will shutdown the computer because it will mean that there are no computers online anymore which can send backups. (or someone pulled out the switch or cables) What should you update? Here is a list which you can use as reference. Maybe I forgot something, please add it. Please keep in mind that this article is meant for Windows. [ ] Favorites (eg: C:\Documents and Settings\username\Favorites) [ ] Addressbook (eg: C:\Documents and Settings\username\Application Data\Microsoft\Address Book) [ ] Apache public directory (if you are running apache, containing all your websites) [ ] MySQL data directory (if you have mySQL installed, MySQL\data contains all your database data) [ ] IIS \wwwroot directory containing your ASP sites/code [ ] Your programming code. Whether you’re using VB, C++… [ ] Photos. If you have a digital camera, please make copies of your photos. Really! [ ] Documents, do not forget to copy the documents you make. [ ] E-mail. [ ] IM data. I’ve got lots of ICQ history. Don’t want to lose that. Ditto for MSN logs. Useful tip: If you have reinstalled Windows recently, or plan to do so, put the important data folders on a different partition (eg: D:\). I’ve got my Eudora configured to store mailbox data in a non C:\ directory, so it’s less likely to be deleted anyway, by a format of C:\, and a sudden case of “forgetting to copy e-mail…”. Just like “My Documents” are not stored on C:\ either. Just like code: Keep it separate organized on a single HD (partition) or so. It makes the overview of your code a lot easier! If you are using Linux, it may be enough to backup your home partition. Closing words Don’t forget: A backup scheme is as good as the weakest link. Really try to get your backup configuration straight. Many on the GPWiki forums will agree with you, after the bad luck they had in the past. I hope this article might motivate some of the programmers that visit the GPWiki to review their backup scheme, and hopefully understand that anything done manual, is just asking for problems. Category:Article